Abstract
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur eget porta erat. Morbi consectetur est vel gravida pretium. Suspendisse ut dui eu ante cursus gravida non sed sem. Nullam sapien tellus, commodo id velit id, eleifend volutpat quam. Phasellus mauris velit, dapibus finibus elementum vel, pulvinar non tellus. Nunc pellentesque pretium diam, quis maximus dolor faucibus id. Nunc convallis sodales ante, ut ullamcorper est egestas vitae. Nam sit amet enim ultrices, ultrices elit pulvinar, volutpat risus.
Biological networks are representations of molecular interactions in the cell. These networks are constructed by collecting biochemical and genetic interactions from different kinds of biological entities, an approach that has improved drastically in the last decades with the advent of modern high-throughput methods. However, there are still many limitations for the gathering and analysis of biological networks, specifically for modelling network evolution across and within species. This is because many biological networks have poor quality or are incomplete, and because the number of organisms for which network data is available is still very limited (Jin et al. 2013; Cusick et al. 2005; Ghadie, Coulombe-Huntington, and Xia 2018). Despite these limitations, many advances have been made in characterizing static networks, and in developing a theoretical framework for studying their evolution across species.
Biological networks evolve as nodes and edges are added or lost in the network. This can be the result of different processes affecting network content, such as gene duplication, gene loss, pseudogenization, HGT or WGD (Wagner 2003; Cork and Purugganan 2004), or a result of processes affecting quantitative properties, including non-synonymous substitutions in the nodes affecting their function (e.g. mutations affecting the binding of ligands, protein domains or DNA motifs may result in changes in metabolic processes, signaling pathways or gene expression regulation, among others) (Jensen 1976; Ghadie, Coulombe-Huntington, and Xia 2018). These evolutionary events remove old connections and generate new ones, in a process that may be random. However, we also know that not all motifs are equally abundant in biological networks (Picard et al. 2008). This may be the result of natural selection removing deleterious connections and favoring advantageous ones. In addition to this, rewiring rates vary depending on several factors. For example, the type of network may have an effect in rewiring rates, since some networks networks such as gene regulatory networks tend to rewire at faster rates than more constrained networks such as metabolic networks (Shou et al. 2011).
Rewiring rates also vary within networks, such as between different network modules and between nodes depending on their importance for the cell’s survival and reproduction, and the essentiality of a node or module may be a constraining factor for network evolution. Previous studies have shown an inverse relationship between highly connected proteins and their rate of substitution (Fraser et al. 2002), possibly due to strong purifying selection acting on the interfacial sites of interacting proteins (Zotenko et al. 2008).
Despite these advances at studying the connections of individual nodes and global network rewiring rates, the rate at which quantitative differences between networks accumulate as a function of species divergence remains relatively unexplored. In particular, it is unknown whether evolutionary rates of inter-node connectivity are correlated with inter-node network measures, such as the minimum path length or average node degree between nodes.
In this paper, we use drug-drug interactions (DDIs as a quantitative proxy measurement of inter-node connectivity, since it has been shown that DDIs are partially dependent on the underlying network topology between targets (Lehar et al. 2007; Yeh et al. 2009). Drug-drug interactions (DDIs) occur when the effect of two or more drugs is significantly stronger or weaker than their expected combined effect, respectively named synergies and antagonisms (Cowen and Steinbach 2008). DDIs are used in the development of novel pharmacological treatments with higher efficiencies at lower doses with the aim of reducing the evolution of drug resistance (Cowen and Steinbach 2008). The reasons why a DDI occurs are varied, but it has been shown that a DDI can occur as a result of factors such as (1) the topology of the underlying network between drug targets, (2) the essentiality of the metabolites blocked by the perturbation, and the inhibition efficiency of the drug on the drug target (Yeh et al. 2009). For example, synergies can occur if drugs act on parallel pathways, where the individual effect of each drug is small and an alternative pathway can compensate for the effect of a single drug (Yeh et al. 2009). Alternatively, antagonistic interactions can occur mainly in two different ways: by causing a partial loss of function in two parallel pathways of an essential product, or as a result of drugs acting sequentially along the same pathway (Yeh et al. 2009). This relation between DDIs and network topology has been further explored using experiments and metabolic flux simulations in yeast, suggesting that specific combinations of perturbations in the network result in different quantifiable interaction scores (Lehar et al. 2007). More recently, it has been shown that most synergistic interactions are the result of drugs targeting the same cellular process, while antagonistic interactions are the result of drugs targeting different processes (Brochado et al. 2018).
Only a few studies have explored interspecific variation of DDIs (Spitzer et al. 2011; Robbins et al. 2015; Brochado et al. 2018), but they have shown that DDI scores could be scalable to include higher numbers of species and strains, allowing for the evaluation of network evolution hypotheses within and between species. A high throughput study of DDIs in gram-negative bacteria has shown that antagonistic interactions are more common, and are almost exclusively detected on drugs that target different cellular processes, while synergies are more conserved across species, and they are more frequent in drugs that target the same process (Brochado et al. 2018). In that study, 70% of the DDIs detected were species-specific, while 20% were strain-specific.
Given these previous findings, we had two related questions. First, do DDI scores diverge as a function of species divergence, consistent with neutral processes acting on the underlying network structure? Or is there substantial constraint on some DDI scores, consistent with evolutionary constraint on the drug targets, their connectivity, and the drugs’ mode of action?
Second, do synergistic interactions evolve at a slower rate than other types of drug interactions, and antagonistic interactions evolve at a faster rate? These differences in rates would be a result of synergistic interactions taking place in local neighborhoods of nodes, while antagonistic interactions act on distant network neighborhoods, which appear to evolve more slowly because of greater network redundancy between distant nodes. Indeed, if we were to map DDIs to protein targets (when known), are the drug targets of synergistic drug interactions closer in the network than additive and antagonistic interactions? And, more generally, are more closely connected nodes subject to higher rates of evolution than more distantly related nodes?
To address these questions, we used the most complete available dataset of species specific and strain specific DDIs (Brochado et al. 2018). We modelled these DDIs under a phylogenetic comparative framework and applied a multivariate brownian motion model to estimate the evolutionary rate of interaction scores for different clusters of DDIs in six strains (three species) of gram-negative bacteria. We also mapped DDIs to their putative protein targets to evaluate them in known biological networks. We show that DDIs can be used as an effective proxy to evaluate macroevolutionary implications and the role of network evolution in interspecific variation.
We obtained DDI scores from a previous study (Brochado et al. 2018) that assessed almost 3000 combinations of 79 different compounds on six strains of three species of gram-negative bacteria (pae: P. aeruginosa PAO1, pau: P. aeruginosa UCBPP-PA14, stm: S. enterica subsp. enterica serovar Typhimurium LT2, seo: S. enterica subsp. enterica serovar Typhimurium 14028S, ecr: E. coli O8 IAI1 (commensal), ebw: E. coli K-12 BW2952). In particular, we used their following datasets, tables ED09C for modelling DDI evolution, and Sup. Tables 1 and 2 to compare different categories of drugs. We constructed phylogenetic trees of the six strains using PhySpeTree to obtain estimate branch lengths (Fang et al. 2019). We used concatenated alignments of 35 highly conserved protein sequences (HCP), and specified the iqtree pipeline within PhySpeTree.
To reduce the dimensionality of the DDI data, we classified DDIs into clusters using tSNE in the R packages bigMap (Garriga and Bartumeus 2018) and bigmemory (Kane, Emerson, and Weston 2013), with the following parameters (threads=80, layers=80, rounds=9). Dimensionality reduction was performed to increase robustness of rate estimators given the small number of species used. Our tSNE analysis was performed for a range of perplexity values between 50 and 2120. DDIs were clustered by similarity across strains, treating each strain as a dimension, thus DDIs that are closer to each other across strains would cluster together under tSNE. The clustering output was visualized and evaluated based on the stability and plateauing of cost and effect size curves. We obtained stable solutions between 700 and 900 perplexity values with this approach. In order to have a more accurate prediction of perplexity, we repeated the previous approach on this smaller range of values. In both cases, the pakde algorithm was applied with perplexity of 1/3 the respective tSNE perplexity. In the range explored, we obtained two optimal solutions at perplexities of 706 and 825, with 13 and 4 clusters respectively. In order to find what solution was optimal, we tested for modularity in the data using the function phylo.modularity within the package geomorph (Adams et al. 2016), which indicated that a perplexity of 706 results in more modular clustering. Thus, we fitted a multivariate brownian motion model to the most modular clustering pattern and calculated the evolutionary rates per cluster using compare.multi.evol.rates, within the package geomorph (Adams et al. 2016). These cluster rates were used as approximations for DDI evolutionary rates, for the DDIs in each of the clusters.
Each drug was identified to a unique Pubchem and CHEMBL IDs using webchem (Szöcs et al. 2020), which were used to retrieve their mode of action from IUPHAR (Armstrong et al. 2020). Similarly, we compared our targets to a previously published dataset on drugs and drug targets (Santos et al. 2017). We identified unique Uniprot IDs and KO IDs for each target protein, and converted these IDs into E. coli Uniprot IDs using KEGG (Kanehisa and Goto 2000). We didn’t include in the analysis drugs whose mode of action was unknown, or that had non-protein molecules as targets, such as small molecules, RNA or DNA.
Biological networks of E. coli were downloaded from EcoliNet (Kim et al. 2015). EcoliNet networks are constructed using inferred links based on different factors such as: co-citation (CC), co-expression (CX), co-occurrence of protein domains (DC), similar genomic context of bacterial orthologs (GN), high-throughput protein-protein interactions (HC), small/medium-scale protein-protein interactions (LC), and similarity of phylogenetic profiles (PG).
For each of the networks we calculated the average path length, and node degree distributions. In addition, the minimum distance between each of the nodes in the network was calculated, as well as the node degree (number of connection per node), and the k-edge connectivity between each pair of nodes (i.e. the minimum number of edges that can be remove to disconnect the nodes). We used the R package igraph (Csardi and Nepusz 2006) to calculate these values in each one of the biological networks and for each node or pair of nodes. We also generated an adjacency matrix, which contains information on whether two nodes are connected directly by an edge or not. In addition, we used k-edge connectivity as a proxy for connectedness between proteins (i.e. a pair with k-edge connectivity equal to zero is disconnected, proteins with k-edge connectivities different than zero are connected). Connectedness is used due to the limitations of adjacency in describing the overall connection between nodes.
Code availability. The code used for data analysis is available from https://github.com/phylogenomic/ch3-netbio
We generated a phylogenetic tree based on highly conserved proteins (Fig. 1A, which shows the evolutionary relations between the strains. Indeed, the tree match the phylogenetic relations between each of these species. For example, E. coli and S. enterica are more closely related, being both members of the family Enterobacteriaceae. In contrast, Pseudomonas belongs to the family Pseudomonadaceae, which is estimated to have separated from Enterobacteriaceae approximately 1481 MYA (Hedges and Kumar 2009). In parallel, We examined DDI variation among these six strains using PCA, and found that Pseudomonas and non-Pseudomonas strains differentiate well along the first principal component axis, which explains 47% of the variance (Fig. 1C). The second axis distinguishes between E. coli and Salmonella strains and explains 22% of the variance. Strains of the same species group closely together in the PCA space, indicating that they share similar global DDI responses.
We then performed hierarchical clustering of the six strains based on their DDI scores to obtain a chemical distance tree among strains (“Strains” in Fig. 1B). Both, our sequence-based phylogeny and the DDI-based tree yielded the same relations between strains. In order to test whether DDI scores (“chemical distance”) diverge as a function of species divergence, we used linear and logarithmic regressions, which measured the amount of DDI variation explained by sequence divergence (Manhattan distance on the phylogeny). We obtained an R2 value of 0.406 for logarithmic regression (Fig. 1D), and an R2 of 0.733 for linear regression, after a log transformation of the phylogenetic distance (Fig. 1E). These results suggest that DDIs, and their underlying network basis, diverge as a function of sequence divergence. Chemical divergence plateaus at higher phylogenetic divergence values; this type of saturation has been previously reported in models of network evolution for rewiring rates across species (Shou et al. 2011).
An examination of clustering across DDIs reveals DDI clusters that are conserved, or vary similarly, across strains (“Drug-Drug Interactions” in Fig. 1B). Here, each of the DDIs was classified using tSNE into 13 different clusters based on their similarity across all strains. Most DDIs that are highly synergistic in all species (shown in blue) group together in 13, while some antagonistic DDIs are clustered into group 2. We categorized each DDIs by whether the two drugs belong to the same drug category, target the same cellular process, or have the same use. We observed that highly synergistic DDI across all species tend to occur when both drugs belong to the same chemical category and target the same cellular process, as it can be seen from the accumulation of red bars in cluster 13. This results agrees well with the analysis done previously on the correlations between synergies and interaction types (Brochado et al. 2018).
Figure 1: Drug-drug interaction (DDI) chemical distance scales with phylogenetic distance. A. Maximum likelihood phylogeny of six strains based on highly conserved proteins. B. Heatmap of DDI by strain data. Hierarchical clustering based on euclidean distances was performed for strains (columns), and DDIS (rows) after clustering DDIs by similarity using tSNE. C. Phylomorphospace plot combining a PCA of the DDI data with the phylogeny; principal components 1 and 2 are shown. D. DDI euclidean distances between strains scale as a function of phylogenetic distance. This relationship saturates at large phylogenetic distances. Intraspecific comparisons (in blue) have the lowest amount of sequence and chemical divergence, while comparisons with Pseudomonas are the most distant in terms of sequence and chemical divergence. E. Same comparison shown as a linear relationship when phylogenetic distance is log transformed.
Out of the 79 compounds analyzed, we were able to identify protein targets for 23 compounds using a database developed by Santos et al. (2017). These compounds had a total of 26 target proteins as identified by their unique protein IDs in E. coli, where the most common target category was bacterial penicillin binding protein, a group involved in the biosynthesis of bacterial cell walls. Other target categories mapped include ribosomal, DNA polymerase, topoisomerase, thymidylate synthase and mitochondrial glycerol-3-phosphate (Sup. Table 1). (Fig. 2).
Figure 2: Graphical representation of the PPI network of E. coli, as determined by small and medium scale experiments (Kim et al. 2015). Each node represents a unique protein in E.coli (Uniprot ID), while red nodes are target proteins identified as participating in DDIs in our analysis. This network contains 767 nodes with an average path length of 4.9.
We measured the length of minimum distance paths between all targets in the network, and found that in E.coli synergistic interactions have the shortest average distance, followed by additive interactions, and antagonisms (Fig. 3). This result is consistent with our previous expectations, given that synergistic interactions tend to be more common between drugs that target the same cellular process, and thus should be closer to each other in biological networks. In addition to this, we repeated this analysis comparing interaction type with two other metrics, k-edge connectivity (Sup. Fig. 1) and average node degree (Sup. Fig. 2). We didn’t find a consistent pattern that showed clear differences between synergistic, additive and antagonistic interactions, suggesting that there may not be clear differences between different types of drug interactions for connectivity or node degree between targets.
Figure 3: Differences in minimum path length between targets for different types of drug interactions in E. coli for different biological networks. Synergistic interactions have a lower average than additive interactions, followed by antagonisms.
We sought to estimate the evolutionary rate of DDI change. However, there is low power with only six tips to estimate rate shifts for any individual DDI. Instead, we used tSNE to detect clusters of DDIs that behave similarly across species, allowing for increased power within each cluster. From this analysis we obtained two different optimal solutions for each of the perplexity ranges explored, one clustering solution with perplexity 706 for the large range analysis (Sup. Document 1) and another solution with perplexity 825 for the small range analysis (Sup. Document 2). These yielded 13 (Fig. 4A) and 4 (Fig. 4B) distinct DDI clusters, respectively. We didn’t detect any clear differentiation among clusters by drug category or cellular processes targeted by each drug (See Supplementary tSNEplots). We conducted 1000 random permutations of the phylo modularity test to determine the optimal clustering solution out of the two obtained. The test supported the 13 cluster solution as the one with the strongest modular signal (p value=0.001, effect size=-26.3594, CR=0.9231, CI= {0.8643,1.0121}) over the 4 clusters (p value=0.001, effect size=-13.9241, CR=0.985, CI={0.9544,1.0049}) (Adams et al. 2016), and we therefore use that going forward.